Programming Contest: SU/V Detection

Roger Jang (±i´¼¬P)


The goal of this programming contest is to use a classifier to determine if a given frame has pitch or not. More specifically, we want to distinguish frames of SU (silence and unvoiced, which do not have pitch) from frames of V (voiced, which has pitch) using features extracted from each frame of a wave file. We also want to perform input (feature) selection/extraction/normalization for possible better performance. The exercise represents a typical approach to pattern recogntion, so the students should pay attention to each step of the procedures.

  1. What to download
    1. Utility, SAP, and Machine Learning Toolboxes from Roger's toolbox homepage.
    2. Baseline example program: exampleProgram.rar
    3. Dataset: The class' recordings of Tang poems (with human labeled pitch in 5 PV files for each student). To try out the example, you can simply use the wave files within the downloaded folder exampleProgram:
      • trainDataByRoger: Training dataset
      • testDataByByway: Test dataset

  2. How to run the example program:
    1. Change toolboxAdd.m to add the toolboxes to the search path.
    2. Change the variable waveDir in goFeatureCollect.m to point to the wave directory. (For first-time test drive, you can keep it to use the default wave directory.)
    3. Run goFeatureColleect.m to collect the dataset and save it as DS.mat for classifier design. This program will call the function mySuvFeaExtract.m to extract various features for SU/V detection, including:
      • volume: absolute volume in a frame. The smaller, the more likely the given frame is SU.
      • zcr: zero crossing rate in a frame. The higher, the more likely the given frame is SU.
      • frameLocalMinCount: number of local minima in a frame. The higher, the more likely the given frame is SU.
      • relVol: relative volume in a frame. The smaller, the more likely the given frame is SU.
      • amdfLocalMinCount¡Gnumber of local minima in the AMDF vector of a frame. The higher, the more likely the given frame is SU.
      • amdfLocalMaxDiff: The difference between the max. and min. of the AMDF vector of a given frame. The higher, the more likely the given frame is V.
    4. Run goDataPlotAll.m to load DS.mat and perform data visualization (using both original and normalized data) via several functions in the Machine Learning Toolbox:
      • dsClassSize.m: compute the data count of each class
      • dsProjPlot1.m: Plot class w.r.t. a single feature
      • dsProjPlot2.m: project the dataset onto 2D plane
      • dsProjPlot3.m: project the dataset onto 3D space
      (If you have better ways to visualize the dataset, feel free to let me know.)
    5. Run goInputSelect.m for selecting 2 features based on KNNC and LOO test. You can change the value of selectionMethod, as follows:
      • selectionMethod='sequential' ===> use sequential forward selection
      • selectionMethod='exhaustive' ===> use exhaustive search
      After execution, you should see two figures with the same data:
      • Figure 1 demonstrates how features were selected during the search process.
      • Figure 2 listed the selected inputs based on the recognition rate.
      Moreover, the indices of the selected 2 features are saved to bestInputIndex.mat. By using the default dataset, our program uses exhaustive search to select "volume" and "frameLocalMinCount" as the best features, with a recognition rate of 95.9% obtained from LOO test on 1NNC. (Note that different datasets may have different results.)

      Note that

      • You can delete the line for input normalization to see if you can get better result.
      • Remember to log related options (the use of input normalization, the best input index, etc) into mySuvOptSet.m.
    6. Run goDataPlot2d.m to read bestSelectedInput.mat and show two scatter plots of the original and normalized data, respectively. You can move you mouse to the data point to display the related information (include wave file name and frame index), which can be used for error analysis of misclassified data. (As it turns out that error analysis is the most important step toward improving the recognition rate!)
    7. Run goTrain.m to design the classifier based on quadratic classifier to get a recognition rate of 94.80%. Since we have selected only 2 features, the following three plots will also be displayed:
      • The surface of the Gaussian probability density function of each class.
      • Scatter plot of the normalized data in 2D plane, with decision boundary.

      Note that

      • You need to log mu and sigma into suvOptSet.m. (Use "mat2str(mu)" to convert it into a string for easy copy-and-paste.)
      • The classifier-specific parameters "cPrm" is stored as "cPrm.mat" for future use.
    8. Run goTest.m to get the recognition rate 92.80% of another set of test wave files. A scatter data plot together with the decision boundary will also be shown.
    (All the above steps are captured in goAll.m for a quick test.)

  3. How to achieve better accuracy (During the lab session, you should at least add 3 features, finish the input selection, and show the classification results using the naive Bayes classifier. Other tasks can be take-home.)

  4. Files to upload for performance evaluation (based on the given dataset and a unseen dataset):

  5. Be aware that